Conversation
Performance Measurements ⏳
|
There was a problem hiding this comment.
Pull request overview
This PR updates the snapshot pinger used by snapct to use TCP connection attempts (instead of ICMP) to estimate peer reachability/latency, and adjusts the snapct tile’s sandboxing configuration to accommodate the new ping mechanism.
Changes:
- Replace ICMP echo handling in
fd_sspingwith a pool of nonblocking TCP sockets andpoll()-based completion tracking. - Update
snapctsandbox allowlists/seccomp policy to account for many ping fds (min/max fd range) and allowpoll. - Adjust
snapctfd budgeting and allowed-fd population to include all ping sockets.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/discof/restore/utils/fd_ssping.h | Adds FD_SSPING_FD_CNT and replaces single-socket API with multi-fd enumeration API. |
| src/discof/restore/utils/fd_ssping.c | Implements TCP-based “ping” via nonblocking connect() and poll(), plus fd pool bookkeeping. |
| src/discof/restore/generated/fd_snapct_tile_seccomp.h | Regenerates seccomp filter to allow poll and accept a ping-fd range (min/max). |
| src/discof/restore/fd_snapct_tile.seccomppolicy | Updates policy inputs and rules to account for multiple ping fds and poll. |
| src/discof/restore/fd_snapct_tile.c | Expands fd budgeting and allowlist population to include all ping fds; passes ping fd range into seccomp generator. |
| deadline_list_ele_remove( peer->state==PEER_STATE_PINGED ? ssping->pinged : ssping->refreshing, peer, ssping->pool ); | ||
| int is_err = ssping->used_fds[ i ].revents & (POLLRDHUP|POLLERR|POLLHUP); | ||
| if( FD_LIKELY( !is_err ) ) { | ||
| peer->latency_nanos = (ulong)fd_long_max( now - (peer->deadline_nanos - PEER_DEADLINE_NANOS_PING), 1L ); | ||
| peer->state = PEER_STATE_VALID; | ||
| peer->deadline_nanos = now + PEER_DEADLINE_NANOS_VALID; | ||
| deadline_list_ele_push_tail( ssping->valid, peer, ssping->pool ); | ||
|
|
||
| FD_LOG_INFO(( "pinged " FD_IP4_ADDR_FMT ":%hu in %lu nanos", | ||
| FD_IP4_ADDR_FMT_ARGS( peer->addr.addr ), fd_ushort_bswap( peer->addr.port ), peer->latency_nanos )); | ||
| ssping->on_ping_cb( ssping->cb_arg, peer->addr, peer->latency_nanos ); | ||
| } else { |
There was a problem hiding this comment.
recv_pings() treats a socket as successfully "pinged" when poll revents does not include POLLERR/POLLHUP/POLLRDHUP, but for nonblocking connect() the reliable way to determine success/failure is to query SO_ERROR via getsockopt(SOL_SOCKET, SO_ERROR). Some connect failures can present as POLLOUT without POLLERR, which would incorrectly mark an unreachable peer as VALID and skew peer selection. Suggest: on POLLOUT, call getsockopt(SO_ERROR) and treat any nonzero error as invalid (and only compute latency / call on_ping_cb when SO_ERROR==0).
| switch( peer->state ) { | ||
| case PEER_STATE_UNPINGED: | ||
| deadline_list_ele_remove( ssping->unpinged, peer, ssping->pool ); | ||
| break; | ||
| case PEER_STATE_PINGED: | ||
| deadline_list_ele_remove( ssping->pinged, peer, ssping->pool ); | ||
| break; | ||
| case PEER_STATE_VALID: | ||
| deadline_list_ele_remove( ssping->valid, peer, ssping->pool ); | ||
| break; | ||
| case PEER_STATE_REFRESHING: | ||
| deadline_list_ele_remove( ssping->refreshing, peer, ssping->pool ); | ||
| break; | ||
| case PEER_STATE_INVALID: | ||
| return; | ||
| } | ||
| peer->state = PEER_STATE_INVALID; | ||
| peer->deadline_nanos = now + PEER_DEADLINE_NANOS_INVALID; | ||
| deadline_list_ele_push_tail( ssping->invalid, peer, ssping->pool ); |
There was a problem hiding this comment.
fd_ssping_invalidate() can transition a peer from PINGED/REFRESHING to INVALID without releasing its associated ping fd / used_fd_idx. With the new invariant that PINGED/REFRESHING peers are exactly those owning a ping fd, this leaves a stale entry in used_fds/ping_to_pool pointing at a peer that is no longer in those lists (and may later be reused by the pool), which can lead to incorrect behavior or use-after-free. When invalidating a peer that currently has a ping fd, reclaim the fd (e.g., via remove_fdesc_idx(peer->used_fd_idx)) before moving it to INVALID.
| 1UL + /* logfile */ | ||
| 1UL; |
There was a problem hiding this comment.
rlimit_file_cnt() now adds an extra bare + 1UL with no comment or corresponding fd usage. This looks like an accidental overcount and makes it harder to audit the fd budget (especially now that FD_SSPING_FD_CNT is significant). Either remove this term or document exactly which file descriptor it is intended to account for.
| 1UL + /* logfile */ | |
| 1UL; | |
| 1UL; /* logfile */ |
| ulong out_fds_cnt, | ||
| int * out_fds ) { | ||
| if( FD_UNLIKELY( out_fds_cnt<6UL ) ) FD_LOG_ERR(( "out_fds_cnt %lu", out_fds_cnt )); | ||
| if( FD_UNLIKELY( out_fds_cnt<255UL ) ) FD_LOG_ERR(( "out_fds_cnt %lu", out_fds_cnt )); |
There was a problem hiding this comment.
populate_allowed_fds() hard-codes a minimum capacity check of 255. This is tightly coupled to the current FD_SSPING_FD_CNT (249) and the current list of always/optionally included fds, and it will silently become wrong if any of those change. Consider expressing this in terms of FD_SSPING_FD_CNT plus the maximum number of non-ssping fds that can be appended in this function (and accounting for the optional logfile fd) so the bound stays correct as the code evolves.
|
@copilot code review[agent] Does this reliably clean up TCP connection state at the remote end? (RST/FIN sent to free up remote's state) |
|
@ripatel-fd I've opened a new pull request, #8664, to work on those changes. Once the pull request is ready, I'll request review from you. |
|
Based on a pcap, it seems that it does reliably clean up connection state. Scanning the linux source code confirms this:
It sends an RST when in the states the spec recommends doing so. In other states, it closes the socket, and when the host responds with a When you call close(), if there's no unread data, it initiates a TCP graceful shutdown, but if there is unread data, it initiates an abortive shutdown. Calling connect with AF_UNSPEC initiates an abortive shutdown in both cases. |
Closes #8255
Closes #8662
Closes #8661